Big Data Startups funded by Y Combinator (YC) 2026

April 2026

Browse 25 of the top Big Data startups funded by Y Combinator.

We also have a Startup Directory where you can search through over 5,000 companies.

  • Librar Labs
    Librar Labs
    Y Combinator LogoW2026
    Active • 3 employees • San Francisco, CA, USA
    Backed by top engineers from OpenAI, Scale AI, and Palantir to build AI that understands human culture. Our first commercial product is for the literature industry with a focus on school libraries. It the AI-native OS for libraries, that speed up all of the manual and boring work for librarians. Fastest growing library tech company in history.
    infrastructure
    machine-learning
    big-data
    ai
  • Parse
    Parse
    Y Combinator LogoF2025
    Active • 2 employees
    Parse lets developers build APIs for any website. Unlike traditional scraping tools that rely on slow, expensive headless browsers, Parse examines websites' underlying network requests and generates API endpoints automatically. This makes Parse 10-100x faster and cheaper than browser automation,.
    developer-tools
    api
    big-data
  • Sciloop
    Sciloop
    Y Combinator LogoF2025
    Active • 2 employees • San Francisco, CA, USA
    Sciloop creates expert-level math and physics problems that frontier AI models can't solve, then sells the data to AI labs for training and evaluation. Our problems are created by IPhO and IMO medalists — the top 0.01% of STEM talent globally. On our benchmark, models like GPT 5.4 Pro and Gemini 3.1 Pro score 0-5% on our hardest problems. We work with AI labs to supply continuous, fresh training data that pushes the frontier of mathematical and scientific reasoning. Founded by Bilal and Osman, International Physics Olympiad medalists from MIT with hands-on ML research experience at MIT CSAIL.
    artificial-intelligence
    big-data
    data-labeling
    marketplace
  • Pleom
    Pleom
    Y Combinator LogoS2025
    Active0
    Automatic insights on all your company data. Speed up business workflows to visualizations with a 0 learning curve.
    generative-ai
    artificial-intelligence
    ai
    analytics
    big-data
  • Liva AI
    Liva AI
    Y Combinator LogoS2025
    Active • 2 employees • San Francisco, CA, USA
    Speech models trained on internet data still lack realistic results. We solve this by collecting targeted training data for model labs. We hope to create a world where AI feels more human.
    b2b
    big-data
    data-labeling
    marketplace
    artificial-intelligence
  • ParaQuery
    ParaQuery
    Y Combinator LogoP2025
    Active • 1 employees • New York, NY, USA
    ParaQuery is a fully-managed GPU-accelerated Spark solution providing double the performance at half the cost compared to solutions like Databricks and BigQuery. It is fully Spark-compatible, cloud agnostic, and has seamless integrations everywhere, all without vendor lock-in. ParaQuery makes it trivial to run large SQL and Spark workloads efficiently and serverlessly, without any data migrations.
    analytics
    infrastructure
    developer-tools
    big-data
    enterprise-software
  • AfterQuery
    AfterQuery
    Y Combinator LogoW2025
    Active • 30 employees • San Francisco, CA, USA
    AfterQuery is an applied research lab curating data solutions for frontier foundation model development. Serving every frontier AI lab.
    b2b
    artificial-intelligence
    ai
    big-data
    data-labeling
  • Archil
    Archil
    Y Combinator LogoF2024
    Active • 11 employees • San Francisco, CA, USA
    Archil transforms S3 buckets into a 30x faster, unlimited, local disk. Archil enables AI, analytics, and serverless applications to instantly access massive data sets without waiting for data transfer. Researchers use Archil for shareable, local storage of data set and model versions that never runs out of capacity.
    developer-tools
    infrastructure
    machine-learning
    big-data
    ai
  • Syntra
    Syntra
    Y Combinator LogoS2024
    Active • 8 employees • San Francisco, CA, USA
    Syntra is building the agentic operating layer for private medical practices, starting with insurance claims QA. We help clinics generate more revenue per clinical encounter while reducing up to 60% of administrative overhead. Syntra supports hundreds of specialty practices nationwide and processes $100Ms in medical claim charges annually.
    ai
    big-data
    healthcare-it
  • Voker
    Voker
    Y Combinator LogoS2024
    Active • 6 employees • Los Angeles, CA, USA
    Voker is the Agent Analytics Platform for monitoring and improving your AI agents. Companies like Dutch.com use our SDK to build better agents. Alex and Tyler met at a high-growth E-Commerce startup where Tyler was running Technology & Data. Together, they built AI products that bootstrapped the company profitably to $100MM in Revenue.
    analytics
    big-data
    developer-tools
    ai
    generative-ai
  • Sharpe
    Sharpe
    Y Combinator LogoS2024
    Active • 3 employees • San Francisco, CA, USA
    Sharpe helps traders go from idea to profit in minutes with AI, bundling petabytes of market data with high-performance infrastructure. Sharpe helped a top 5 quant firm find $10m+ of profit within a few hours of deployment.
    artificial-intelligence
    finance
    analytics
    data-engineering
    big-data
  • Upsolve AI
    Upsolve AI
    Y Combinator LogoW2024
    Active • 5 employees
    Upsolve AI is a customer-facing analytics as a service platform. We are building a full data stack that enables businesses to build and offer analytics to their customers at lightning speed and gives their customers the superpower to answer any data questions via AI. The company is founded by Ka Ling Wu and Serguei Balanovich, who built a similar product at Palantir before (featured in Palantir's S-1), growing it to 50+ enterprise customers and 8-figures of annual revenue in 2 years.
    analytics
    developer-tools
    big-data
    artificial-intelligence
    b2b
  • CambioML
    CambioML
    Y Combinator LogoS2023
    Active • 3 employees
    Energent.ai turns every knowledge worker into a top performer by pairing them with an AI teammate. We streamline manual workflows without integrations, democratize automation with zero code, and move beyond passive chatbots to deliver real, visible output — all securely within the enterprise environment.
    ai-assistant
    automation
    big-data
    productivity
  • NewsCatcher
    NewsCatcher
    Y Combinator LogoS2022
    Active • 22 employees • Kyiv, Ukraine, 02000
    CatchAll by NewsCatcher is a recall-first web search API built for queries where the results are spread across hundreds or thousands of pages on the web. Instead of returning the top ranked links like traditional search engines, CatchAll retrieves a large candidate set from the web, validates which pages actually match the query, and extracts structured records of real-world events. Developers and data teams use CatchAll to answer “long-list” questions such as tracking regulatory actions, funding rounds, product launches, corporate expansions, or cybersecurity incidents. The output is not just links but clean, deduplicated datasets that can power AI agents, monitoring systems, analytics pipelines, and market intelligence workflows. CatchAll runs on the data infrastructure developed by NewsCatcher, which continuously indexes millions of articles and public web pages across a global network of sources.
    big-data
    enterprise
    enterprise-software
    saas
    artificial-intelligence
  • Endla
    Endla
    Y Combinator LogoS2021
    Active • 8 employees • Brisbane QLD, Australia
    Endla increases the value of oil & gas wells by providing software that helps design the well that maximizes ROI. Our product AlphaSpace, optimizes the well design by automatically producing many high-quality options which the engineer can then measure against their business objectives (auto-design). Having software that finds the optimal solution, empowers the engineer to work a layer up on understanding the problem and specifying the important objectives. Our vision is to make auto-design and auto-operation software part of the workflow for every engineer working with physical assets.
    energy
    big-data
    analytics
  • Supabase
    Supabase
    Y Combinator LogoS2020
    Active • 120 employees • San Francisco, CA, USA
    Supabase is the easiest way to get started with Postgres. Each project within Supabase is an isolated Postgres cluster, allowing customers to scale independently, while still providing the features that you need to build: instant database setup, auth, row level security, realtime data streams, auto-generating APIs, and a simple to use web interface. We are 100% remote.
    open-source
    databases
    data-engineering
    big-data
    developer-tools
  • Gecko Robotics
    Gecko Robotics
    Y Combinator LogoW2016
    Active • 230 employees • Pittsburgh, PA, USA
    Gecko Robotics is the pioneer of AI + Robotics [AIR technology], transforming how the world builds, operates, and maintains its most critical infrastructure for a more reliable and sustainable future. Using fixed sensors and robots that climb, crawl, swim, and fly, we combine first-order data layers with the predictive power of AI into a single source of truth for the physical world. Cantilever™ is our operating platform, powered by AIR technology, that empowers teams to achieve operational excellence through actionable data for immediate and long-term planning.
    big-data
    energy
    robotics
    data-engineering
    artificial-intelligence
  • Deasy Labs
    Deasy Labs
    Y Combinator LogoS2023
    Acquired • 8 employees
    Deasy Labs was acquired by Collibra in July 2025 (global leader in enterprise data governance). Deasy Labs provides metadata orchestration for AI workflows. Deasie's platform provides the best way for AI teams to create and embed high-quality, customized metadata into their AI workflows (e.g., RAG, Agentic frameworks). Our three founders (from Amazon, McKinsey/QuantumBlack & MIT) previously built an ML data governance tool from 0 to 1 within McKinsey, which we deployed with 11 Fortune 500 companies. We saw in early 2023 the ability to create high-quality metadata (without reliance on domain experts) would be a key factor in achieving the accuracy & speed in GenAI applications required for production. Our investors include General Catalyst, Y Combinator, RTP Global and world experts in enterprise data. Website: https://deasylabs.com
    ai-assistant
    data-labeling
    databases
    big-data
    artificial-intelligence
  • Tarsal
    Tarsal
    Y Combinator LogoS2021
    Acquired • 10 employees • New York, NY, USA
    Tarsal is a data pipeline custom built for security teams. As security data grows 25% year over year, security teams desperately need access to best-in-class data infrastructure. Tarsal bridges the gap between the modern data stack and security teams, pioneering the modern security data stack.
    cybersecurity
    data-engineering
    big-data
    b2b
  • Terark
    Y Combinator LogoW2017
    Acquired • 2 employees • Beijing, China
    Terark built a new storage engine for Database and Data Systems. Our technology enables direct search on highly compressed data, with 200X faster read performance and more than 10X storage savings (better than Google's LevelDB, Facebook's RocksDB), getting larger scalability with lower cost for big data applications. Alibaba is our paying customer, and we are a YCombinator company.
    big-data
    cloud-computing
  • Scuba
    Scuba
    Y Combinator LogoW2013
    Acquired • 51 employees
    Scuba is the fast and scalable event-based analytics solution to answer critical business questions about how customers behave and products are used. Interana allows users to analyze and explore the key business metrics that matter most in a data-driven world – such as growth, retention, conversion and engagement – in seconds, rather than the hours or days it often takes with existing solutions. Interana allows customers to discover and investigate these key insights easily through its visual and interactive interface, which makes data analysis a natural extension of everyone’s workflow.
    analytics
    big-data
    data-engineering
    data-visualization
  • Mattermark
    Y Combinator LogoS2012
    Acquired • 11 employees • San Francisco, CA, USA
    At Mattermark, we’re accelerating sales and deal making through data and automation. Mattermark collects and organizes comprehensive information on the world’s fastest growing companies. In minutes, get actionable data that lets you pinpoint the companies and people you need to know or do business with. Today, over 500 companies use Mattermark to discover high quality leads, prioritize prospects and increase conversion rates.
    big-data
    investing
  • Citus Data
    Citus Data
    Y Combinator LogoS2011
    Acquired • 45 employees • San Francisco, CA, USA
    The amount of time businesses spend on their databases is altogether too much time. Citus is fixing this problem. Citus is worry-free Postgres. Built to scale out, Citus is an extension to Postgres that is available as open source, as enterprise software that can be run on-prem or on any cloud, and as a fully-managed database as a service. Whether you have a multi-tenant application that needs to scale out, or you need performance for your real-time analytics customers, with Citus, you can focus on your app—not your database. Founded in 2011, Citus Data is venture backed by Khosla Ventures and Data Collective. Citus is a Y Combinator alumnus and has offices in San Francisco’s SoMa district and Istanbul, Turkey. At Citus, we make it simple to scale out Postgres. Citus Data online: www.citusdata.com Documentation: docs.citusdata.com GitHub: github.com/citusdata/citus
    databases
    big-data
    open-source
  • Amiato
    Y Combinator LogoW2012
    Acquired • 2 employees • Palo Alto, CA, USA
    Amiato's real-time integration service moves your data to where it's most valuable to you. Today's flexible databases like MongoDB and CouchBase let agile businesses accelerate and scale their operations, but analyzing their data has been elusive. Amiato unlocks the value of that unstructured data by bridging the gap to familiar tools in the rich structured world of BI, all with zero setup work. Run reports, do interactive ad-hoc analysis, and combine disparate silos. Our Schema-lift (TM) technology allows you to immediately integrate new endpoints and seamlessly keep up with changes in your data. We get customers up and running in a day instead of weeks, and let them focus on business instead of wrestling infrastructure.
    big-data
    analytics